EDA properties

Authors
Affiliation

Name I, First Name I

Name of the University

Name II, First Name II

Published

April 29, 2024

Abstract

The following machine learning project focuses on…

1 Introduction

  • Overview and Motivation
  • Related Work
  • Research questions

2 TESTING if R works and if Python works

print('hello')
[1] "hello"
# Python code
import numpy as np
print(np.mean([10, 20, 30, 40, 50]))
30.0

3 Data

  • Sources
  • Description
  • Wrangling/cleaning
  • Spotting mistakes and missing data (could be part of EDA too)
  • Listing anomalies and outliers (could be part of EDA too)
library(here)
directory <- here("data")

properties <- read.csv(file.path(directory,"properties.csv"))

metrecarré <- read.csv(file.path(directory,"snb-data-plimoincha-fr-all-20240321_0900.csv"))
# Assuming your data frame is named metrecarré
metrecarré <- metrecarré[-1, ]

split_data <- lapply(metrecarré, function(x) unlist(strsplit(as.character(x), ";")))

split_df <- do.call(rbind, split_data)
Warning in (function (..., deparse.level = 1) : number of columns of result is
not a multiple of vector length (arg 2)
#install.packages("readxl")
library(readxl)

# Read the cpi file
cpi_data_excel <- read_excel(file.path(directory,"cpi.xlsx"))
Warning: Expecting numeric in B4 / R4C2: got a date
Warning: Expecting numeric in C4 / R4C3: got a date
Warning: Expecting numeric in D4 / R4C4: got a date
Warning: Expecting numeric in E4 / R4C5: got a date
Warning: Expecting numeric in F4 / R4C6: got a date
Warning: Expecting numeric in G4 / R4C7: got a date
Warning: Expecting numeric in H4 / R4C8: got a date
Warning: Coercing numeric to date in I1024 / R1024C9
Warning: Coercing numeric to date in I1025 / R1025C9
Warning: Coercing numeric to date in I1026 / R1026C9
Warning: Coercing numeric to date in I1027 / R1027C9
Warning: Coercing numeric to date in I1028 / R1028C9
Warning: Coercing numeric to date in I1029 / R1029C9
Warning: Coercing numeric to date in I1030 / R1030C9
Warning: Coercing numeric to date in I1031 / R1031C9
Warning: Coercing numeric to date in I1032 / R1032C9
Warning: Coercing numeric to date in I1033 / R1033C9
Warning: Coercing numeric to date in I1034 / R1034C9
Warning: Coercing numeric to date in I1035 / R1035C9
Warning: Coercing numeric to date in I1036 / R1036C9
Warning: Coercing numeric to date in I1037 / R1037C9
Warning: Coercing numeric to date in I1038 / R1038C9
Warning: Coercing numeric to date in I1039 / R1039C9
Warning: Coercing numeric to date in I1040 / R1040C9
Warning: Coercing numeric to date in I1041 / R1041C9
Warning: Coercing numeric to date in I1042 / R1042C9
Warning: Coercing numeric to date in I1043 / R1043C9
Warning: Coercing numeric to date in I1044 / R1044C9
Warning: Coercing numeric to date in I1045 / R1045C9
Warning: Coercing numeric to date in I1046 / R1046C9
Warning: Coercing numeric to date in I1047 / R1047C9
Warning: Coercing numeric to date in I1048 / R1048C9
Warning: Coercing numeric to date in I1049 / R1049C9
Warning: Coercing numeric to date in I1050 / R1050C9
Warning: Coercing numeric to date in I1051 / R1051C9
Warning: Coercing numeric to date in I1052 / R1052C9
Warning: Coercing numeric to date in I1053 / R1053C9
Warning: Coercing numeric to date in I1054 / R1054C9
Warning: Coercing numeric to date in I1055 / R1055C9
Warning: Coercing numeric to date in I1056 / R1056C9
Warning: Coercing numeric to date in I1057 / R1057C9
Warning: Coercing numeric to date in I1058 / R1058C9
Warning: Coercing numeric to date in I1059 / R1059C9
Warning: Coercing numeric to date in I1060 / R1060C9
Warning: Coercing numeric to date in I1061 / R1061C9
Warning: Coercing numeric to date in I1062 / R1062C9
Warning: Coercing numeric to date in I1063 / R1063C9
Warning: Coercing numeric to date in I1064 / R1064C9
Warning: Coercing numeric to date in I1065 / R1065C9
Warning: Coercing numeric to date in I1066 / R1066C9
Warning: Coercing numeric to date in I1067 / R1067C9
Warning: Coercing numeric to date in I1068 / R1068C9
Warning: Coercing numeric to date in I1069 / R1069C9
Warning: Coercing numeric to date in I1070 / R1070C9
Warning: Coercing numeric to date in I1071 / R1071C9
Warning: Coercing numeric to date in I1072 / R1072C9
Warning: Coercing numeric to date in I1073 / R1073C9
Warning: Coercing numeric to date in I1074 / R1074C9
Warning: Coercing numeric to date in I1075 / R1075C9
Warning: Coercing numeric to date in I1076 / R1076C9
Warning: Coercing numeric to date in I1077 / R1077C9
Warning: Coercing numeric to date in I1078 / R1078C9
Warning: Coercing numeric to date in I1079 / R1079C9
Warning: Coercing numeric to date in I1080 / R1080C9
Warning: Coercing numeric to date in I1081 / R1081C9
Warning: Coercing numeric to date in I1082 / R1082C9
Warning: Coercing numeric to date in I1083 / R1083C9
Warning: Coercing numeric to date in I1084 / R1084C9
Warning: Coercing numeric to date in J1084 / R1084C10
Warning: Coercing numeric to date in I1085 / R1085C9
Warning: Coercing numeric to date in J1085 / R1085C10
Warning: Coercing numeric to date in I1086 / R1086C9
Warning: Coercing numeric to date in J1086 / R1086C10
Warning: Coercing numeric to date in I1087 / R1087C9
Warning: Coercing numeric to date in J1087 / R1087C10
Warning: Coercing numeric to date in I1088 / R1088C9
Warning: Coercing numeric to date in J1088 / R1088C10
Warning: Coercing numeric to date in I1089 / R1089C9
Warning: Coercing numeric to date in J1089 / R1089C10
Warning: Coercing numeric to date in I1090 / R1090C9
Warning: Coercing numeric to date in J1090 / R1090C10
Warning: Coercing numeric to date in I1091 / R1091C9
Warning: Coercing numeric to date in J1091 / R1091C10
Warning: Coercing numeric to date in I1092 / R1092C9
Warning: Coercing numeric to date in J1092 / R1092C10
Warning: Coercing numeric to date in I1093 / R1093C9
Warning: Coercing numeric to date in J1093 / R1093C10
Warning: Coercing numeric to date in I1094 / R1094C9
Warning: Coercing numeric to date in J1094 / R1094C10
Warning: Coercing numeric to date in I1095 / R1095C9
Warning: Coercing numeric to date in J1095 / R1095C10
Warning: Coercing numeric to date in I1096 / R1096C9
Warning: Coercing numeric to date in J1096 / R1096C10
Warning: Coercing numeric to date in I1097 / R1097C9
Warning: Coercing numeric to date in J1097 / R1097C10
Warning: Coercing numeric to date in I1098 / R1098C9
Warning: Coercing numeric to date in J1098 / R1098C10
Warning: Coercing numeric to date in I1099 / R1099C9
Warning: Coercing numeric to date in J1099 / R1099C10
Warning: Coercing numeric to date in I1100 / R1100C9
Warning: Coercing numeric to date in J1100 / R1100C10
Warning: Coercing numeric to date in I1101 / R1101C9
Warning: Coercing numeric to date in J1101 / R1101C10
Warning: Coercing numeric to date in I1102 / R1102C9
Warning: Coercing numeric to date in J1102 / R1102C10
Warning: Coercing numeric to date in I1103 / R1103C9
Warning: Coercing numeric to date in J1103 / R1103C10
Warning: Coercing numeric to date in I1104 / R1104C9
Warning: Coercing numeric to date in J1104 / R1104C10
Warning: Coercing numeric to date in I1105 / R1105C9
Warning: Coercing numeric to date in J1105 / R1105C10
Warning: Coercing numeric to date in I1106 / R1106C9
Warning: Coercing numeric to date in J1106 / R1106C10
Warning: Coercing numeric to date in I1107 / R1107C9
Warning: Coercing numeric to date in J1107 / R1107C10
Warning: Coercing numeric to date in I1108 / R1108C9
Warning: Coercing numeric to date in J1108 / R1108C10
Warning: Coercing numeric to date in I1109 / R1109C9
Warning: Coercing numeric to date in J1109 / R1109C10
Warning: Coercing numeric to date in I1110 / R1110C9
Warning: Coercing numeric to date in J1110 / R1110C10
Warning: Coercing numeric to date in I1111 / R1111C9
Warning: Coercing numeric to date in J1111 / R1111C10
Warning: Coercing numeric to date in I1112 / R1112C9
Warning: Coercing numeric to date in J1112 / R1112C10
Warning: Coercing numeric to date in I1113 / R1113C9
Warning: Coercing numeric to date in J1113 / R1113C10
Warning: Coercing numeric to date in I1114 / R1114C9
Warning: Coercing numeric to date in J1114 / R1114C10
Warning: Coercing numeric to date in I1115 / R1115C9
Warning: Coercing numeric to date in J1115 / R1115C10
Warning: Coercing numeric to date in I1116 / R1116C9
Warning: Coercing numeric to date in J1116 / R1116C10
Warning: Coercing numeric to date in I1117 / R1117C9
Warning: Coercing numeric to date in J1117 / R1117C10
Warning: Coercing numeric to date in I1118 / R1118C9
Warning: Coercing numeric to date in J1118 / R1118C10
Warning: Coercing numeric to date in I1119 / R1119C9
Warning: Coercing numeric to date in J1119 / R1119C10
Warning: Coercing numeric to date in I1120 / R1120C9
Warning: Coercing numeric to date in J1120 / R1120C10
Warning: Coercing numeric to date in I1121 / R1121C9
Warning: Coercing numeric to date in J1121 / R1121C10
Warning: Coercing numeric to date in I1122 / R1122C9
Warning: Coercing numeric to date in J1122 / R1122C10
Warning: Coercing numeric to date in I1123 / R1123C9
Warning: Coercing numeric to date in J1123 / R1123C10
Warning: Coercing numeric to date in I1124 / R1124C9
Warning: Coercing numeric to date in J1124 / R1124C10
Warning: Coercing numeric to date in I1125 / R1125C9
Warning: Coercing numeric to date in J1125 / R1125C10
Warning: Coercing numeric to date in I1126 / R1126C9
Warning: Coercing numeric to date in J1126 / R1126C10
Warning: Coercing numeric to date in I1127 / R1127C9
Warning: Coercing numeric to date in J1127 / R1127C10
Warning: Coercing numeric to date in I1128 / R1128C9
Warning: Coercing numeric to date in J1128 / R1128C10
Warning: Coercing numeric to date in I1129 / R1129C9
Warning: Coercing numeric to date in J1129 / R1129C10
Warning: Coercing numeric to date in I1130 / R1130C9
Warning: Coercing numeric to date in J1130 / R1130C10
Warning: Coercing numeric to date in I1131 / R1131C9
Warning: Coercing numeric to date in J1131 / R1131C10
Warning: Coercing numeric to date in I1132 / R1132C9
Warning: Coercing numeric to date in J1132 / R1132C10
Warning: Coercing numeric to date in I1133 / R1133C9
Warning: Coercing numeric to date in J1133 / R1133C10
Warning: Coercing numeric to date in I1134 / R1134C9
Warning: Coercing numeric to date in J1134 / R1134C10
Warning: Coercing numeric to date in I1135 / R1135C9
Warning: Coercing numeric to date in J1135 / R1135C10
Warning: Coercing numeric to date in I1136 / R1136C9
Warning: Coercing numeric to date in J1136 / R1136C10
Warning: Coercing numeric to date in I1137 / R1137C9
Warning: Coercing numeric to date in J1137 / R1137C10
Warning: Coercing numeric to date in I1138 / R1138C9
Warning: Coercing numeric to date in J1138 / R1138C10
Warning: Coercing numeric to date in I1139 / R1139C9
Warning: Coercing numeric to date in J1139 / R1139C10
Warning: Coercing numeric to date in I1140 / R1140C9
Warning: Coercing numeric to date in J1140 / R1140C10
Warning: Coercing numeric to date in I1141 / R1141C9
Warning: Coercing numeric to date in J1141 / R1141C10
Warning: Coercing numeric to date in I1142 / R1142C9
Warning: Coercing numeric to date in J1142 / R1142C10
Warning: Coercing numeric to date in I1143 / R1143C9
Warning: Coercing numeric to date in J1143 / R1143C10
Warning: Coercing numeric to date in I1144 / R1144C9
Warning: Coercing numeric to date in J1144 / R1144C10
Warning: Coercing numeric to date in K1144 / R1144C11
Warning: Coercing numeric to date in I1145 / R1145C9
Warning: Coercing numeric to date in J1145 / R1145C10
Warning: Coercing numeric to date in K1145 / R1145C11
Warning: Coercing numeric to date in I1146 / R1146C9
Warning: Coercing numeric to date in J1146 / R1146C10
Warning: Coercing numeric to date in K1146 / R1146C11
Warning: Coercing numeric to date in I1147 / R1147C9
Warning: Coercing numeric to date in J1147 / R1147C10
Warning: Coercing numeric to date in K1147 / R1147C11
Warning: Coercing numeric to date in I1148 / R1148C9
Warning: Coercing numeric to date in J1148 / R1148C10
Warning: Coercing numeric to date in K1148 / R1148C11
Warning: Coercing numeric to date in I1149 / R1149C9
Warning: Coercing numeric to date in J1149 / R1149C10
Warning: Coercing numeric to date in K1149 / R1149C11
Warning: Coercing numeric to date in I1150 / R1150C9
Warning: Coercing numeric to date in J1150 / R1150C10
Warning: Coercing numeric to date in K1150 / R1150C11
Warning: Coercing numeric to date in I1151 / R1151C9
Warning: Coercing numeric to date in J1151 / R1151C10
Warning: Coercing numeric to date in K1151 / R1151C11
Warning: Coercing numeric to date in I1152 / R1152C9
Warning: Coercing numeric to date in J1152 / R1152C10
Warning: Coercing numeric to date in K1152 / R1152C11
Warning: Coercing numeric to date in I1153 / R1153C9
Warning: Coercing numeric to date in J1153 / R1153C10
Warning: Coercing numeric to date in K1153 / R1153C11
Warning: Coercing numeric to date in I1154 / R1154C9
Warning: Coercing numeric to date in J1154 / R1154C10
Warning: Coercing numeric to date in K1154 / R1154C11
Warning: Coercing numeric to date in I1155 / R1155C9
Warning: Coercing numeric to date in J1155 / R1155C10
Warning: Coercing numeric to date in K1155 / R1155C11
Warning: Coercing numeric to date in I1156 / R1156C9
Warning: Coercing numeric to date in J1156 / R1156C10
Warning: Coercing numeric to date in K1156 / R1156C11
Warning: Coercing numeric to date in I1157 / R1157C9
Warning: Coercing numeric to date in J1157 / R1157C10
Warning: Coercing numeric to date in K1157 / R1157C11
Warning: Coercing numeric to date in I1158 / R1158C9
Warning: Coercing numeric to date in J1158 / R1158C10
Warning: Coercing numeric to date in K1158 / R1158C11
Warning: Coercing numeric to date in I1159 / R1159C9
Warning: Coercing numeric to date in J1159 / R1159C10
Warning: Coercing numeric to date in K1159 / R1159C11
Warning: Coercing numeric to date in I1160 / R1160C9
Warning: Coercing numeric to date in J1160 / R1160C10
Warning: Coercing numeric to date in K1160 / R1160C11
Warning: Coercing numeric to date in I1161 / R1161C9
Warning: Coercing numeric to date in J1161 / R1161C10
Warning: Coercing numeric to date in K1161 / R1161C11
Warning: Coercing numeric to date in I1162 / R1162C9
Warning: Coercing numeric to date in J1162 / R1162C10
Warning: Coercing numeric to date in K1162 / R1162C11
Warning: Coercing numeric to date in I1163 / R1163C9
Warning: Coercing numeric to date in J1163 / R1163C10
Warning: Coercing numeric to date in K1163 / R1163C11
Warning: Coercing numeric to date in I1164 / R1164C9
Warning: Coercing numeric to date in J1164 / R1164C10
Warning: Coercing numeric to date in K1164 / R1164C11
Warning: Coercing numeric to date in I1165 / R1165C9
Warning: Coercing numeric to date in J1165 / R1165C10
Warning: Coercing numeric to date in K1165 / R1165C11
Warning: Coercing numeric to date in I1166 / R1166C9
Warning: Coercing numeric to date in J1166 / R1166C10
Warning: Coercing numeric to date in K1166 / R1166C11
Warning: Coercing numeric to date in I1167 / R1167C9
Warning: Coercing numeric to date in J1167 / R1167C10
Warning: Coercing numeric to date in K1167 / R1167C11
Warning: Coercing numeric to date in I1168 / R1168C9
Warning: Coercing numeric to date in J1168 / R1168C10
Warning: Coercing numeric to date in K1168 / R1168C11
Warning: Coercing numeric to date in I1169 / R1169C9
Warning: Coercing numeric to date in J1169 / R1169C10
Warning: Coercing numeric to date in K1169 / R1169C11
Warning: Coercing numeric to date in I1170 / R1170C9
Warning: Coercing numeric to date in J1170 / R1170C10
Warning: Coercing numeric to date in K1170 / R1170C11
Warning: Coercing numeric to date in I1171 / R1171C9
Warning: Coercing numeric to date in J1171 / R1171C10
Warning: Coercing numeric to date in K1171 / R1171C11
Warning: Coercing numeric to date in I1172 / R1172C9
Warning: Coercing numeric to date in J1172 / R1172C10
Warning: Coercing numeric to date in K1172 / R1172C11
Warning: Coercing numeric to date in I1173 / R1173C9
Warning: Coercing numeric to date in J1173 / R1173C10
Warning: Coercing numeric to date in K1173 / R1173C11
Warning: Coercing numeric to date in I1174 / R1174C9
Warning: Coercing numeric to date in J1174 / R1174C10
Warning: Coercing numeric to date in K1174 / R1174C11
Warning: Coercing numeric to date in I1175 / R1175C9
Warning: Coercing numeric to date in J1175 / R1175C10
Warning: Coercing numeric to date in K1175 / R1175C11
Warning: Coercing numeric to date in I1176 / R1176C9
Warning: Coercing numeric to date in J1176 / R1176C10
Warning: Coercing numeric to date in K1176 / R1176C11
Warning: Coercing numeric to date in I1177 / R1177C9
Warning: Coercing numeric to date in J1177 / R1177C10
Warning: Coercing numeric to date in K1177 / R1177C11
Warning: Coercing numeric to date in I1178 / R1178C9
Warning: Coercing numeric to date in J1178 / R1178C10
Warning: Coercing numeric to date in K1178 / R1178C11
Warning: Coercing numeric to date in I1179 / R1179C9
Warning: Coercing numeric to date in J1179 / R1179C10
Warning: Coercing numeric to date in K1179 / R1179C11
Warning: Coercing numeric to date in I1180 / R1180C9
Warning: Coercing numeric to date in J1180 / R1180C10
Warning: Coercing numeric to date in K1180 / R1180C11
Warning: Coercing numeric to date in I1181 / R1181C9
Warning: Coercing numeric to date in J1181 / R1181C10
Warning: Coercing numeric to date in K1181 / R1181C11
Warning: Coercing numeric to date in I1182 / R1182C9
Warning: Coercing numeric to date in J1182 / R1182C10
Warning: Coercing numeric to date in K1182 / R1182C11
Warning: Coercing numeric to date in I1183 / R1183C9
Warning: Coercing numeric to date in J1183 / R1183C10
Warning: Coercing numeric to date in K1183 / R1183C11
Warning: Coercing numeric to date in I1184 / R1184C9
Warning: Coercing numeric to date in J1184 / R1184C10
Warning: Coercing numeric to date in K1184 / R1184C11
Warning: Coercing numeric to date in I1185 / R1185C9
Warning: Coercing numeric to date in J1185 / R1185C10
Warning: Coercing numeric to date in K1185 / R1185C11
Warning: Coercing numeric to date in I1186 / R1186C9
Warning: Coercing numeric to date in J1186 / R1186C10
Warning: Coercing numeric to date in K1186 / R1186C11
Warning: Coercing numeric to date in I1187 / R1187C9
Warning: Coercing numeric to date in J1187 / R1187C10
Warning: Coercing numeric to date in K1187 / R1187C11
Warning: Coercing numeric to date in I1188 / R1188C9
Warning: Coercing numeric to date in J1188 / R1188C10
Warning: Coercing numeric to date in K1188 / R1188C11
Warning: Coercing numeric to date in I1189 / R1189C9
Warning: Coercing numeric to date in J1189 / R1189C10
Warning: Coercing numeric to date in K1189 / R1189C11
Warning: Coercing numeric to date in I1190 / R1190C9
Warning: Coercing numeric to date in J1190 / R1190C10
Warning: Coercing numeric to date in K1190 / R1190C11
Warning: Coercing numeric to date in I1191 / R1191C9
Warning: Coercing numeric to date in J1191 / R1191C10
Warning: Coercing numeric to date in K1191 / R1191C11
Warning: Coercing numeric to date in I1192 / R1192C9
Warning: Coercing numeric to date in J1192 / R1192C10
Warning: Coercing numeric to date in K1192 / R1192C11
Warning: Coercing numeric to date in I1193 / R1193C9
Warning: Coercing numeric to date in J1193 / R1193C10
Warning: Coercing numeric to date in K1193 / R1193C11
Warning: Coercing numeric to date in I1194 / R1194C9
Warning: Coercing numeric to date in J1194 / R1194C10
Warning: Coercing numeric to date in K1194 / R1194C11
Warning: Coercing numeric to date in I1195 / R1195C9
Warning: Coercing numeric to date in J1195 / R1195C10
Warning: Coercing numeric to date in K1195 / R1195C11
Warning: Coercing numeric to date in I1196 / R1196C9
Warning: Coercing numeric to date in J1196 / R1196C10
Warning: Coercing numeric to date in K1196 / R1196C11
Warning: Coercing numeric to date in I1197 / R1197C9
Warning: Coercing numeric to date in J1197 / R1197C10
Warning: Coercing numeric to date in K1197 / R1197C11
Warning: Coercing numeric to date in I1198 / R1198C9
Warning: Coercing numeric to date in J1198 / R1198C10
Warning: Coercing numeric to date in K1198 / R1198C11
Warning: Coercing numeric to date in I1199 / R1199C9
Warning: Coercing numeric to date in J1199 / R1199C10
Warning: Coercing numeric to date in K1199 / R1199C11
Warning: Coercing numeric to date in I1200 / R1200C9
Warning: Coercing numeric to date in J1200 / R1200C10
Warning: Coercing numeric to date in K1200 / R1200C11
Warning: Coercing numeric to date in I1201 / R1201C9
Warning: Coercing numeric to date in J1201 / R1201C10
Warning: Coercing numeric to date in K1201 / R1201C11
Warning: Coercing numeric to date in I1202 / R1202C9
Warning: Coercing numeric to date in J1202 / R1202C10
Warning: Coercing numeric to date in K1202 / R1202C11
Warning: Coercing numeric to date in I1203 / R1203C9
Warning: Coercing numeric to date in J1203 / R1203C10
Warning: Coercing numeric to date in K1203 / R1203C11
Warning: Coercing numeric to date in I1204 / R1204C9
Warning: Coercing numeric to date in J1204 / R1204C10
Warning: Coercing numeric to date in K1204 / R1204C11
Warning: Coercing numeric to date in L1204 / R1204C12
Warning: Coercing numeric to date in I1205 / R1205C9
Warning: Coercing numeric to date in J1205 / R1205C10
Warning: Coercing numeric to date in K1205 / R1205C11
Warning: Coercing numeric to date in L1205 / R1205C12
Warning: Coercing numeric to date in I1206 / R1206C9
Warning: Coercing numeric to date in J1206 / R1206C10
Warning: Coercing numeric to date in K1206 / R1206C11
Warning: Coercing numeric to date in L1206 / R1206C12
Warning: Coercing numeric to date in I1207 / R1207C9
Warning: Coercing numeric to date in J1207 / R1207C10
Warning: Coercing numeric to date in K1207 / R1207C11
Warning: Coercing numeric to date in L1207 / R1207C12
Warning: Coercing numeric to date in I1208 / R1208C9
Warning: Coercing numeric to date in J1208 / R1208C10
Warning: Coercing numeric to date in K1208 / R1208C11
Warning: Coercing numeric to date in L1208 / R1208C12
Warning: Coercing numeric to date in I1209 / R1209C9
Warning: Coercing numeric to date in J1209 / R1209C10
Warning: Coercing numeric to date in K1209 / R1209C11
Warning: Coercing numeric to date in L1209 / R1209C12
Warning: Coercing numeric to date in I1210 / R1210C9
Warning: Coercing numeric to date in J1210 / R1210C10
Warning: Coercing numeric to date in K1210 / R1210C11
Warning: Coercing numeric to date in L1210 / R1210C12
Warning: Coercing numeric to date in I1211 / R1211C9
Warning: Coercing numeric to date in J1211 / R1211C10
Warning: Coercing numeric to date in K1211 / R1211C11
Warning: Coercing numeric to date in L1211 / R1211C12
Warning: Coercing numeric to date in I1212 / R1212C9
Warning: Coercing numeric to date in J1212 / R1212C10
Warning: Coercing numeric to date in K1212 / R1212C11
Warning: Coercing numeric to date in L1212 / R1212C12
Warning: Coercing numeric to date in I1213 / R1213C9
Warning: Coercing numeric to date in J1213 / R1213C10
Warning: Coercing numeric to date in K1213 / R1213C11
Warning: Coercing numeric to date in L1213 / R1213C12
Warning: Coercing numeric to date in I1214 / R1214C9
Warning: Coercing numeric to date in J1214 / R1214C10
Warning: Coercing numeric to date in K1214 / R1214C11
Warning: Coercing numeric to date in L1214 / R1214C12
Warning: Coercing numeric to date in I1215 / R1215C9
Warning: Coercing numeric to date in J1215 / R1215C10
Warning: Coercing numeric to date in K1215 / R1215C11
Warning: Coercing numeric to date in L1215 / R1215C12
Warning: Coercing numeric to date in I1216 / R1216C9
Warning: Coercing numeric to date in J1216 / R1216C10
Warning: Coercing numeric to date in K1216 / R1216C11
Warning: Coercing numeric to date in L1216 / R1216C12
Warning: Coercing numeric to date in I1217 / R1217C9
Warning: Coercing numeric to date in J1217 / R1217C10
Warning: Coercing numeric to date in K1217 / R1217C11
Warning: Coercing numeric to date in L1217 / R1217C12
Warning: Coercing numeric to date in I1218 / R1218C9
Warning: Coercing numeric to date in J1218 / R1218C10
Warning: Coercing numeric to date in K1218 / R1218C11
Warning: Coercing numeric to date in L1218 / R1218C12
Warning: Coercing numeric to date in I1219 / R1219C9
Warning: Coercing numeric to date in J1219 / R1219C10
Warning: Coercing numeric to date in K1219 / R1219C11
Warning: Coercing numeric to date in L1219 / R1219C12
Warning: Coercing numeric to date in I1220 / R1220C9
Warning: Coercing numeric to date in J1220 / R1220C10
Warning: Coercing numeric to date in K1220 / R1220C11
Warning: Coercing numeric to date in L1220 / R1220C12
Warning: Coercing numeric to date in I1221 / R1221C9
Warning: Coercing numeric to date in J1221 / R1221C10
Warning: Coercing numeric to date in K1221 / R1221C11
Warning: Coercing numeric to date in L1221 / R1221C12
Warning: Coercing numeric to date in I1222 / R1222C9
Warning: Coercing numeric to date in J1222 / R1222C10
Warning: Coercing numeric to date in K1222 / R1222C11
Warning: Coercing numeric to date in L1222 / R1222C12
Warning: Coercing numeric to date in I1223 / R1223C9
Warning: Coercing numeric to date in J1223 / R1223C10
Warning: Coercing numeric to date in K1223 / R1223C11
Warning: Coercing numeric to date in L1223 / R1223C12
Warning: Coercing numeric to date in I1224 / R1224C9
Warning: Coercing numeric to date in J1224 / R1224C10
Warning: Coercing numeric to date in K1224 / R1224C11
Warning: Coercing numeric to date in L1224 / R1224C12
Warning: Coercing numeric to date in I1225 / R1225C9
Warning: Coercing numeric to date in J1225 / R1225C10
Warning: Coercing numeric to date in K1225 / R1225C11
Warning: Coercing numeric to date in L1225 / R1225C12
Warning: Coercing numeric to date in I1226 / R1226C9
Warning: Coercing numeric to date in J1226 / R1226C10
Warning: Coercing numeric to date in K1226 / R1226C11
Warning: Coercing numeric to date in L1226 / R1226C12
Warning: Coercing numeric to date in I1227 / R1227C9
Warning: Coercing numeric to date in J1227 / R1227C10
Warning: Coercing numeric to date in K1227 / R1227C11
Warning: Coercing numeric to date in L1227 / R1227C12
Warning: Coercing numeric to date in I1228 / R1228C9
Warning: Coercing numeric to date in J1228 / R1228C10
Warning: Coercing numeric to date in K1228 / R1228C11
Warning: Coercing numeric to date in L1228 / R1228C12
Warning: Coercing numeric to date in I1229 / R1229C9
Warning: Coercing numeric to date in J1229 / R1229C10
Warning: Coercing numeric to date in K1229 / R1229C11
Warning: Coercing numeric to date in L1229 / R1229C12
Warning: Coercing numeric to date in I1230 / R1230C9
Warning: Coercing numeric to date in J1230 / R1230C10
Warning: Coercing numeric to date in K1230 / R1230C11
Warning: Coercing numeric to date in L1230 / R1230C12
Warning: Coercing numeric to date in I1231 / R1231C9
Warning: Coercing numeric to date in J1231 / R1231C10
Warning: Coercing numeric to date in K1231 / R1231C11
Warning: Coercing numeric to date in L1231 / R1231C12
Warning: Coercing numeric to date in I1232 / R1232C9
Warning: Coercing numeric to date in J1232 / R1232C10
Warning: Coercing numeric to date in K1232 / R1232C11
Warning: Coercing numeric to date in L1232 / R1232C12
Warning: Coercing numeric to date in I1233 / R1233C9
Warning: Coercing numeric to date in J1233 / R1233C10
Warning: Coercing numeric to date in K1233 / R1233C11
Warning: Coercing numeric to date in L1233 / R1233C12
Warning: Coercing numeric to date in I1234 / R1234C9
Warning: Coercing numeric to date in J1234 / R1234C10
Warning: Coercing numeric to date in K1234 / R1234C11
Warning: Coercing numeric to date in L1234 / R1234C12
Warning: Coercing numeric to date in I1235 / R1235C9
Warning: Coercing numeric to date in J1235 / R1235C10
Warning: Coercing numeric to date in K1235 / R1235C11
Warning: Coercing numeric to date in L1235 / R1235C12
Warning: Coercing numeric to date in I1236 / R1236C9
Warning: Coercing numeric to date in J1236 / R1236C10
Warning: Coercing numeric to date in K1236 / R1236C11
Warning: Coercing numeric to date in L1236 / R1236C12
Warning: Coercing numeric to date in I1237 / R1237C9
Warning: Coercing numeric to date in J1237 / R1237C10
Warning: Coercing numeric to date in K1237 / R1237C11
Warning: Coercing numeric to date in L1237 / R1237C12
Warning: Coercing numeric to date in I1238 / R1238C9
Warning: Coercing numeric to date in J1238 / R1238C10
Warning: Coercing numeric to date in K1238 / R1238C11
Warning: Coercing numeric to date in L1238 / R1238C12
Warning: Coercing numeric to date in I1239 / R1239C9
Warning: Coercing numeric to date in J1239 / R1239C10
Warning: Coercing numeric to date in K1239 / R1239C11
Warning: Coercing numeric to date in L1239 / R1239C12
Warning: Coercing numeric to date in I1240 / R1240C9
Warning: Coercing numeric to date in J1240 / R1240C10
Warning: Coercing numeric to date in K1240 / R1240C11
Warning: Coercing numeric to date in L1240 / R1240C12
Warning: Coercing numeric to date in I1241 / R1241C9
Warning: Coercing numeric to date in J1241 / R1241C10
Warning: Coercing numeric to date in K1241 / R1241C11
Warning: Coercing numeric to date in L1241 / R1241C12
Warning: Coercing numeric to date in I1242 / R1242C9
Warning: Coercing numeric to date in J1242 / R1242C10
Warning: Coercing numeric to date in K1242 / R1242C11
Warning: Coercing numeric to date in L1242 / R1242C12
New names:
• `` -> `...2`
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
• `` -> `...9`
• `` -> `...10`
• `` -> `...11`
• `` -> `...12`
• `` -> `...13`
# Write the data to a CSV file
write.csv(cpi_data_excel, file.path(directory,"cpi.csv"), row.names = FALSE)
# Load the CSV file into a data frame
cpi <- read.csv(file.path(directory,"cpi.csv"))


# Read the empty_dwellings file
empty_excel <- read_excel(file.path(directory,"empty_dwellings_by_canton_1999-2023.xlsx"))
New names:
• `` -> `...2`
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
• `` -> `...9`
• `` -> `...10`
• `` -> `...11`
• `` -> `...12`
• `` -> `...13`
# Write the data to a CSV file
write.csv(empty_excel, file.path(directory,"empty_dwellings.csv"), row.names = FALSE)
# Load the CSV file into a data frame
empty_dwellings <- read.csv(file.path(directory,"empty_dwellings.csv"))

4 Exploratory data analysis

  • Mapping out the underlying structure
  • Identifying the most important variables
  • Univariate visualizations
  • Multivariate visualizations
  • Summary tables

4.1 Change the path below

# Load required libraries
library(ggplot2)
library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
library(here)

4.2 Loading and small cleaning (not complete for now)

# Load the CSV file into a data frame
properties1 <- read.csv(file.path(directory, "v1_properties.csv"))
properties2 <- read.csv(file.path(directory, "v2_properties.csv"))
# Combine the two data frames
properties <- rbind(properties1, properties2)

# Remove rows with missing values
#properties_without_missing <- properties[complete.cases(properties), ]

# Identify values causing the issue
problematic_values <- properties$number_of_rooms[is.na(as.numeric(properties$number_of_rooms))]

# Replace non-numeric values with NA
properties$number_of_rooms <- as.numeric(gsub("[^0-9.]", "", properties$number_of_rooms))

# Remove non-numeric characters and convert to numeric
properties$price <- as.numeric(gsub("[^0-9]", "", properties$price))

# Subset the dataset to exclude rows with price < 10000
properties <- properties[properties$price >= 10000, ]

# Subset the dataset to exclude rows with numbers of rooms < 25
properties <- properties[properties$number_of_rooms <25, ]

4.3 Histogram of prices

histogram_price <- ggplot(properties, aes(x = price)) +
  geom_histogram(binwidth = 100000, fill = "skyblue", color = "red") +
  labs(title = "Distribution of Prices",
       x = "Price",
       y = "Frequency") +
  theme_minimal()
# Convert ggplot object to plotly object
interactive_histogram_price <- ggplotly(histogram_price)
# Display the interactive histogram
interactive_histogram_price

4.4 Histogram of prices for each property type

note : only price between 0 and 500000 so some outliers aren’t here

# Create the ggplot object
histogram <- ggplot(properties, aes(x = price)) +
  geom_histogram(binwidth = 100000, fill = "skyblue", color = "black") +
  facet_wrap(~ property_type, scales = "free", ncol = 2) +
  labs(title = "Distribution of Prices by Property Type",
       x = "Price",
       y = "Frequency") +
  theme_minimal() +
  xlim(0, 5000000)

# Convert ggplot object to plotly object
interactive_histogram <- ggplotly(histogram)

# Display the interactive plot
interactive_histogram

4.5 Histogram of prices for each year category

note : only price between 0 and 500000 so some outliers aren’t here

properties$year_category <- substr(properties$year_category, 1, 9)
# Assuming 'year_category' is a column in the 'properties' dataset
properties$year_category <- as.factor(properties$year_category)
# Create a histogram of prices for each year category
histogram <- ggplot(properties, aes(x = price)) +
  geom_histogram(binwidth = 100000, fill = "skyblue", color = "black") +
  facet_wrap(~ year_category, scales = "free", ncol = 2) +
  labs(title = "Distribution of Prices by Year Category",
       x = "Price",
       y = "Frequency") +
  theme_minimal() +
  xlim(0, 5000000)
# Convert ggplot object to plotly object
interactive_histogram_year <- ggplotly(histogram)
# Display the interactive plot
interactive_histogram_year

4.6 Histogram of prices for each canton

note : only price between 0 and 500000 so some outliers aren’t here

histogram <- ggplot(properties, aes(x = price)) +
  geom_histogram(binwidth = 100000, fill = "skyblue", color = "black") +
  facet_wrap(~ canton, scales = "free", ncol = 2) +
  labs(title = "Distribution of Prices by Canton",
       x = "Price",
       y = "Frequency") +
  theme_minimal() +
  xlim(0, 5000000)

# Convert ggplot object to plotly object
interactive_histogram <- ggplotly(histogram)

# Display the interactive plot
interactive_histogram

4.7 Histogram of prices for each number of rooms

note : only price between 0 and 500000 so some outliers aren’t here

and the graph below only show apartments with less than 10 rooms (but you can change the code if needed

# Preprocess the number_of_rooms column
properties$number_of_rooms <- as.character(properties$number_of_rooms)
properties$number_of_rooms <- gsub("\\D", "", properties$number_of_rooms)  # Remove non-numeric characters
properties$number_of_rooms <- as.numeric(properties$number_of_rooms)       # Convert to numeric
properties$number_of_rooms <- trunc(properties$number_of_rooms)             # Truncate non-integer values
properties_room <- properties[properties$number_of_rooms < 10, ]                  # Filter only number_of_rooms less than 10

# Create a histogram of prices for each number of rooms
histogram <- ggplot(properties_room, aes(x = price)) +
  geom_histogram(binwidth = 100000, fill = "skyblue", color = "black") +
  facet_wrap(~ number_of_rooms, scales = "free", ncol = 2) +
  labs(title = "Distribution of Prices by Number of Rooms",
       x = "Price",
       y = "Frequency") +
  theme_minimal() +
  xlim(0, 5000000)

# Convert ggplot object to plotly object
interactive_histogram <- ggplotly(histogram)

# Display the interactive plot
interactive_histogram

4.8 Test Regression

# Perform multiple linear regression
model <- lm(price ~ number_of_rooms + canton + property_type + year_category, data = properties)

# Summarize the regression model
summary(model)

Call:
lm(formula = price ~ number_of_rooms + canton + property_type + 
    year_category, data = properties)

Residuals:
     Min       1Q   Median       3Q      Max 
-2314382  -559822  -108156   241293 22536934 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    431321.2   131379.4   3.283 0.001032 ** 
number_of_rooms                  1357.6      684.4   1.984 0.047317 *  
cantonappenzell-inner-rhoden  -143313.8   402840.2  -0.356 0.722031    
cantonbasel-landschaft         227183.9   129109.2   1.760 0.078511 .  
cantonbasel-stadt              700560.6   155485.5   4.506 6.71e-06 ***
cantonbern                     -10976.8   127130.1  -0.086 0.931196    
cantonfribourg                -169186.4   125962.0  -1.343 0.179261    
cantonglarus                  -266743.8   211576.8  -1.261 0.207440    
cantonlucerne                  350778.4   142862.4   2.455 0.014096 *  
cantonnidwalden                492509.1   346376.2   1.422 0.155098    
cantonobwalden                 616771.6   346124.3   1.782 0.074799 .  
cantonschaffhausen             -97355.0   166801.2  -0.584 0.559467    
cantonschwyz                   651288.9   179930.3   3.620 0.000297 ***
cantonsolothurn               -232439.8   130615.9  -1.780 0.075186 .  
cantonuri                      389540.9   218252.9   1.785 0.074330 .  
cantonvaud                     874966.4   123931.1   7.060 1.81e-12 ***
cantonzug                     1408051.0   159937.1   8.804  < 2e-16 ***
cantonzurich                   693267.1   128649.1   5.389 7.30e-08 ***
property_typeAttic flat        392966.5    80684.6   4.870 1.14e-06 ***
property_typeBifamiliar house  463159.6    63525.0   7.291 3.38e-13 ***
property_typeChalet            846346.1   107094.0   7.903 3.10e-15 ***
property_typeDuplex            104998.1    78283.7   1.341 0.179878    
property_typeFarm house        995929.6   143791.9   6.926 4.67e-12 ***
property_typeLoft              498088.4   459675.6   1.084 0.278591    
property_typeRoof flat           9933.7    96105.5   0.103 0.917678    
property_typeRustic house     1050378.7  1215025.2   0.864 0.387345    
property_typeSingle house     1100195.6    35905.0  30.642  < 2e-16 ***
property_typeTerrace flat      421440.5   143323.8   2.940 0.003287 ** 
property_typeVilla            1379373.4    64604.0  21.351  < 2e-16 ***
year_category1919-1945        -132783.4    87141.3  -1.524 0.127607    
year_category1946-1960        -138372.6    86556.1  -1.599 0.109940    
year_category1961-1970         -34994.7    72378.9  -0.483 0.628760    
year_category1971-1980         -84239.4    63229.3  -1.332 0.182806    
year_category1981-1990         -89926.5    62864.6  -1.430 0.152620    
year_category1991-2000         294006.4    66195.6   4.441 9.06e-06 ***
year_category2001-2005         255690.3    76793.1   3.330 0.000874 ***
year_category2006-2010         288423.4    69891.4   4.127 3.72e-05 ***
year_category2011-2015         315433.1    69742.1   4.523 6.19e-06 ***
year_category2016-2022          93644.7    72462.9   1.292 0.196288    
year_category2016-2024         263519.5    66575.2   3.958 7.62e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1212000 on 7760 degrees of freedom
  (28 observations deleted due to missingness)
Multiple R-squared:  0.2355,    Adjusted R-squared:  0.2317 
F-statistic:  61.3 on 39 and 7760 DF,  p-value: < 2.2e-16

5 Supervised learning

  • Data splitting (if a training/test set split is enough for the global analysis, at least one CV or bootstrap must be used)
  • Two or more models
  • Two or more scores
  • Tuning of one or more hyperparameters per model
  • Interpretation of the model(s)

6 Unsupervised learning

  • Clustering and/or dimension reduction

7 Conclusion

  • Brief summary of the project
  • Take home message
  • Limitations
  • Future work?